CBT Campus' Online Skills Training Courses.

IT Skills

Enterprise Database Systems

Data Science

Streaming Data Architectures

it_dssdardj_01_enus

it_dssdardj_02_enus

Streaming Data Architectures: An Introduction to Streaming Data

Course Number:
it_dssdardj_01_enus

Expected Duration (hours)
0.9

Lesson Objectives

Streaming Data Architectures: An Introduction to Streaming Data

Course Overview
recognize the differences between batch and streaming data and the types of streaming data sources
list the steps in involved in processing streaming data, the transformation of streams, and the materialization of the results of the transformation
describe how the use of a message transport decouples a streaming application from the sources of streaming data
describe the techniques used in Spark 1.x to work with streaming data and how it contrasts with processing batch data
recall how structured streaming in Spark 2.x is able to ease the task of stream processing for the app developer
compare how streaming processing works in both Spark 1.x and 2.x
recognize how triggers can be set up to periodically process streaming data and describe the various output modes available to publish the results of stream processing
recognize the key aspects of working with structured streaming in Spark

Overview/Description

Spark, an analytics engine built on Hadoop, can be used for working with big data, data science, and processing batch and streaming data. Explore the fundamentals of working with streams using Spark in this 9-video course. Key concepts covered here include the differences between batch and streaming data and the types of streaming data sources; processing streaming data, transformation of streams, and materialization of the results of the transformation; and how use of a message transport decouples a streaming application from the sources of streaming data. Next, learn about techniques used in Spark 1.x to work with streaming data and how it contrasts with processing batch data; how structured streaming in Spark 2.x is able to ease the task of stream processing for the app developer; and how streaming processing works in both Spark 1.x and 2.x. Finally, learn how triggers can be set up to periodically process streaming data and the various output modes available to publish the results of stream processing; and the key aspects of working with structured streaming in Spark.

Target

Prerequisites: none

Streaming Data Architectures: Processing Streaming Data

Course Number:
it_dssdardj_02_enus

Expected Duration (hours)
0.9

Lesson Objectives

Streaming Data Architectures: Processing Streaming Data

Course Overview
install the latest available version of PySpark
configure a streaming data source using Netcat and write an application to process the stream
describe the effects of using the Update mode for the output of your stream processing application
write an application to listen for new files being added to a directory and process them as soon as they come in
compare the Append output to the Update mode and distinguish between the two
develop applications that limit the files processed in each trigger and use Spark's Complete mode for the output
perform aggregation operations on streaming data using the DataFrame API
work with Spark SQL in order to process streaming data using SQL queries
define and apply standard, re-usable transformations for streaming data
recall they key ways to use Spark for streaming data and explore the ways to process streams and generate output

Overview/Description

Spark is an analytics engine built on Hadoop that works with big data, data science and processing batch, and streaming data. In this 11-video course, discover how to develop applications in Spark to work with streaming data and explore different ways to process streams and generate output. Key concepts covered here include installing the latest version of PySpark; configuring a streaming data source using Netcat and writing applications to process the stream; and effects of using the Update mode for output of your stream processing application. Learn how to write an application to listen for new files added to a directory; compare the Append output to the Update mode and distinguish between the two; and develop applications that limit files processed in each trigger and use Spark's Complete mode for output. Next, learners perform aggregation operations on streaming data with the DataFrame API (application programming interface); work with Spark SQL to process streaming data by using SQL queries; and learn ways to use Spark for streaming data and ways to process streams and generate output.

Target

Prerequisites: none